Smart near-real-time folder scan based on a breadth first search

ABSTRACT

In response to a folder event received for a first folder, a first work item is dequeued from an ID queue and metadata of the first folder, and immediate children of the first folder, is fetched and enqueued as work items in a metadata queue. If further first folder children remain to be scanned, the first work item is updated with child IDs for each immediate child of the first folder that is a folder, and it is inserted into the ID queue. In a second pass, a child ID is dequeued and metadata of immediate children of the folder associated with the child ID is fetched and enqueued as work items in the metadata queue. The second pass is repeated for all child IDs in the updated work item. This process is repeated for each generation of children of the first folder or until a specified limit is met.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of pending U.S. patent applicationSer. No. 17/154,833, entitled “SMART NEAR-REAL-TIME FOLDER SCAN BASED ONA BREADTH FIRST SEARCH,” which was filed on Jan. 21, 2021, and is hereinincorporated by reference in its entirety.

BACKGROUND

Applications, resources, and data may be accessible to computers overone or more networks at one or more servers. Storage devices at suchservers may be referred to as “network based storage,” and the serversmay be referred to as “network based servers.” “Cloud storage” (alsoknown as “cloud based storage) is a form of network-based storage wheredata can be stored at and be accessed from remote storage devices atservers that are accessible over the Internet. “Cloud computing” refersto the on-demand availability of computer system resources (e.g.,applications, services, processors, storage devices, file systems,databases, etc.) over the Internet and data stored in cloud storage.Servers hosting cloud based resources may be referred to as “cloud basedservers” (or “cloud servers”).

Various cloud-based file systems (e.g., SharePoint®, DropBox®, Google®Drive, etc.) provide file hosting services such as file storage, filesynchronization, document management, etc. These services allow users tocreate, access, edit, share, and/or collaborate with other users onfiles and folders. Cloud application security services (e.g., data lossprevention services) may monitor user's activities with respect to thefiles and folders and generate analytics to identify cyberthreats andcontrol data travel in the cloud.

An activity that involves accessing a file or a folder may prompt thesecurity service to review the activity against policies that apply tothe accessed file or folder. For example, a policy may be configuredthat excludes certain users from collaborating on a file. The securityservice may track activities occurring in a customer's cloud account anddetect a breach of the policy by an unauthorized user accessing theprotected file. The security system may then notify the customer thatthe policy has been violated.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

Methods performed by a computer program executing on a computing device,and systems that are configured to perform such methods, are describedherein. In one aspect, a method is performed in a cloud server forenqueueing metadata of children of a first folder associated with afolder event. The method includes, in a first pass: dequeuing, from afirst persistent queue, a first work item associated with the folderevent and fetching first metadata of the first folder and of immediatechildren of the first folder. The immediate children comprise at leastone of a second folder or a file. The method further includes creating asecond work item comprising the fetched first metadata and enqueuing thesecond work item to a second persistent queue. In response todetermining that the first pass did not complete scanning children ofthe first folder, the first work item is updated for each second folderof the immediate children by adding a child folder identifier (ID) intoan internal queue of the first work item. The updated first work item isinserted into the first persistent queue. In a second pass, the methodfurther includes dequeuing, from the first persistent queue, a childfolder ID of the internal queue for a second folder of the immediatechildren. Second metadata of immediate children of the second folder ofthe immediate children of the first folder is fetched based on the childfolder ID. A third work item is created where the third work itemcomprises the fetched second metadata, and the third work item isenqueued to the second persistent queue.

Further features and advantages of embodiments, as well as the structureand operation of various embodiments, are described in detail below withreference to the accompanying drawings. It is noted that the methods andsystems are not limited to the specific embodiments described herein.Such embodiments are presented herein for illustrative purposes only.Additional embodiments will be apparent to persons skilled in therelevant art(s) based on the teachings contained herein.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated herein and form a partof the specification, illustrate embodiments of the present applicationand, together with the description, further serve to explain theprinciples of the embodiments and to enable a person skilled in thepertinent art to make and use the embodiments.

FIG. 1 is a block diagram of a cloud based folder scanning system forqueuing file and folder metadata from a cloud based file hosting systemin a persistent queue in response to a folder event, according to anexample embodiment.

FIG. 2 is a flowchart of a method in a cloud server for performing abreadth first search for file and folder metadata in response to afolder event, and enqueuing the metadata in a persistent queue,according to an example embodiment.

FIG. 3 is a block diagram of a system with a VM node configured toperform a breadth first search for file and folder metadata in responseto a folder event, and enqueue the metadata in a persistent queue,according to an example embodiment.

FIG. 4 is a flowchart of a method in a VM node for performing a breadthfirst search for file and folder metadata in a file hosting system, andenqueuing the metadata in a persistent queue, according to an exampleembodiment.

FIG. 5 is a flowchart of a method for fetching metadata of a folder andimmediate children of the folder, according to an example embodiment.

FIG. 6 is a flowchart of a method for fetching metadata in multiple setsof the metadata, according to an example embodiment.

FIG. 7 is a flowchart comprising a step for enqueueing metadata ofchildren of a first folder associated with a folder event, according toan example embodiment.

FIG. 8 is a flowchart of a method for receiving a folder event from acloud based file hosting system, according to an example embodiment.

FIG. 9 is a flowchart of a method for queueing metadata of a folder or afile, according to an example embodiment.

FIG. 10 is a flowchart of a method for repeating steps for enqueuingmetadata associated with a folder event, according to an exampleembodiment.

FIG. 11 is a flowchart of a method for repeating steps for enqueuingmetadata associated with a folder event for each generation of childrenof a first folder or until a specified limit is met, according to anexample embodiment.

FIG. 12 is a flowchart of a method for limiting file and folder scansfor fetching metadata, according to an example embodiment.

FIG. 13 is a block diagram of an example processor-based computer systemthat may be used to implement various embodiments.

The features and advantages of the embodiments described herein willbecome more apparent from the detailed description set forth below whentaken in conjunction with the drawings, in which like referencecharacters identify corresponding elements throughout. In the drawings,like reference numbers generally indicate identical, functionallysimilar, and/or structurally similar elements. The drawing in which anelement first appears is indicated by the leftmost digit(s) in thecorresponding reference number.

DETAILED DESCRIPTION I. Introduction

The present specification and accompanying drawings disclose one or moreembodiments that incorporate the features of the disclosed embodiments.The scope of the embodiments is not limited only to the aspectsdisclosed herein. The disclosed embodiments merely exemplify theintended scope, and modified versions of the disclosed embodiments arealso encompassed. Embodiments are defined by the claims appended hereto.

References in the specification to “one embodiment,” “an embodiment,”“an example embodiment,” etc., indicate that the embodiment describedmay include a particular feature, structure, or characteristic, butevery embodiment may not necessarily include the particular feature,structure, or characteristic. Moreover, such phrases are not necessarilyreferring to the same embodiment. Further, when a particular feature,structure, or characteristic is described in connection with anembodiment, it is submitted that it is within the knowledge of oneskilled in the art to effect such feature, structure, or characteristicin connection with other embodiments whether or not explicitlydescribed.

Furthermore, it should be understood that spatial descriptions (e.g.,“above,” “below,” “up,” “left,” “right,” “down,” “top,” “bottom,”“vertical,” “horizontal,” etc.) used herein are for purposes ofillustration only, and that practical implementations of the structuresdescribed herein can be spatially arranged in any orientation or manner.

In the discussion, unless otherwise stated, adjectives such as“substantially” and “about” modifying a condition or relationshipcharacteristic of a feature or features of an embodiment of thedisclosure, are understood to mean that the condition or characteristicis defined to within tolerances that are acceptable for operation of theembodiment for an application for which it is intended.

Numerous exemplary embodiments are described as follows. It is notedthat any section/subsection headings provided herein are not intended tobe limiting. Embodiments are described throughout this document, and anytype of embodiment may be included under any section/subsection.Furthermore, embodiments disclosed in any section/subsection may becombined with any other embodiments described in the samesection/subsection and/or a different section/subsection in any manner.

II. Example Embodiments

As described above, cloud application security systems may offer dataloss prevention services for customers. The present disclosure providesmethods and systems for a near-real-time folder scanner that may beutilized for data loss prevention services or other applications. Such ascanner may track activities in file hosting systems that occur incustomers' cloud accounts (e.g., in software as a service (SaaS)accounts). The tracked activities may comprise events received for filesand/or folders, such as for sharing, editing, creating, or adding acollaborator to a file and/or a folder. For example, a folder event maytrigger a status update process on the folder itself and its children(e.g., direct and indirect children including one or more generations ofsub-files and/or sub-folders). In some embodiments, a folder may beassociated with a vast number of sub-files and/or sub-folders. Scanningsuch folder children may consume numerous resources in terms ofprocessing time and the number of application programming interface(API) calls made to the cloud based file hosting system for metadata. Toavoid potential latency, the depth of a scan may be limited to a certainnumber of generation(s) of children, or there may be a limitation on thenumber of files allowed to be scanned in response to a folder eventbefore the scanning process is stopped. In some embodiments, the processmay continue to complete the scanning of files in a folder even afterthe number of files allowed to be scanned has been reached.

In order to handle high loads of folder events (e.g., thousands ofevents per second) from the same file hosting service account, inparallel (e.g., each using the same bandwidth from the file hostingservice), a breadth first search (BFS) process may be divided into smallsteps. For example, every iteration of the BFS process may update a workitem state and re-send the updated work item to a queue, in order todistribute the required API calls over time and improve fairness amongthe different near-real-time events. Moreover, the limitations (e.g.,scan depth, total count of files in scan, etc.) may be specified and maybe adjusted dynamically, in order to reduce latency in handling anear-real-time event. These technical improvements allow fast and fullcoverage of the scope of near-real-time events that are created incustomers' cloud accounts.

A cloud server may comprise one or more virtual machine (VM) nodes thatmay function as file and/or folder scanners. The VM node(s) may beconfigured to detect file and/or folder events received from one or morea third party systems (e.g., cloud based file hosting systems services).Generally speaking, the VM node(s) may convert an event into a work itemthat may be placed in a task queue (i.e., a first persistent queue oridentifier (ID) queue). If the event represents only file basedactivities, the event may include metadata corresponding to the affectedfile(s), or a VM node may fetch metadata associated with the file(s).For example, the VM node may issue an API call to the file hostingsystem that reported the event, and in response, receive the associatedmetadata from the file hosting system. The VM node may then create asecond work item comprising the metadata (i.e., metadata work item) andplace it in a test queue (e.g., in a second persistent queue or metadataqueue). If the event represents folder activity (i.e., root folderactivity), the VM node may scan the root folder and children of the rootfolder, including any files and/or folders that may be located insidethe root folder. In other words, the VM node may fetch metadata for theroot folder and the children of the root folder, create a second workitem comprising the fetched metadata (i.e., a metadata work item), andplace the second work item in the second persistent queue. For eachchild folder of the root folder, the same process is performedincluding: fetching metadata of the children of the child folder(including file and/or folder metadata), placing the fetched metadata ina metadata work item, and enqueuing the metadata work item in the secondpersistent queue. Once this process is completed for each immediatechild folder of the root folder, the process may be repeated again foreach successive generation of folder children. However, the scanningprocess may be throttled or halted before the entire structure ofsub-folders and sub-files are scanned, according to limitationparameters that may be configured in the system. For example, the depthof scanning from a root folder into levels or generations of childrenmay be limited, or a limitation on the number of files allowed to bescanned for a folder event may be configured in the system. In someembodiments, the limits may be dynamically adjusted by the system toallow for completion of a full scan of a sub-folder (e.g., to avoid apartial folder scan). Moreover, scans for sub-files and/or sub-foldersof a root folder may be broken down into multiple iterations of smallerscans, where each iteration has a specified number of allowed API calls.This limitation on API calls may be imposed by the file hosting system.In this manner, a breadth first searching (BFS) method may beimplemented in the folder scanning process while system resources, suchas processing time and the number of API calls sent to the file hostingsystems, and potential latency in other events handling are managed withthe configured or dynamically adjusted limitations.

As described above, the structure within a folder may be complex. Thefolder may comprise files and/or inner folders (i.e., sub-files and/orsub-folders), and those inner folders might also comprise files andinner folders and so on (e.g., as in a tree structure). Therefore, thescanning process described herein may walk through multiple generationsof children of a root folder to find the sub-files and sub-folders, andcreate new metadata work items for each of them, to be enqueued to thesecond persistent queue (i.e., the metadata queue). By placing themetadata work items into the second persistent queue, the metadata workitems may be available to be fed into a pipeline, or may be otherwiseprocessed, to examine the metadata and detect possible policy violationsassociated with a file or folder, for example, by a security applicationsuch as a data loss prevention application.

In general, a user may take an action with respect to a file or folderof a cloud based file hosting system and the action may trigger an eventthat may be sent to a VM node. If it is a folder event, the VM node maywalk through a tree structure comprising multiple levels sub-foldersand/or sub-files, gather metadata for each sub-folder and sub-file, andenqueue corresponding metadata work items to the second persistentqueue. Then policies may be tested against the metadata and policyviolations may be determined. For example, if a policy indicates thatonly users listed as collaborators of a folder are allowed to change thecontent of a file in the folder, the security system may compare theusers listed as collaborators in the metadata to a user reported in anevent as having changed the content of the file, determine that the useris not a collaborator, and report to a customer that a policy violationoccurred.

Embodiments for near-real-time folder scans based on a breadth firstsearch may be implemented in various ways. For example, FIG. 1 is ablock diagram of a cloud based folder scanning system 100 for queuingfile and folder metadata from a cloud based file hosting system in apersistent queue in response to a folder event, according to an exampleembodiment. As shown in FIG. 1 , system 100 includes a cloud server 102and three cloud based file hosting systems 140. Cloud server 102includes virtual machine node 110, which comprises first persistentqueue 120 and second persistent queue 130. Each of the cloud based filehosting systems 140 comprises an application programming interface (API)service 142. System 100 is described in detail as follows.

Note that although embodiments are described herein in a cloud computingcontext (e.g., with respect to cloud based servers, etc.), embodimentsare also applicable to other network based implementations.

As described in more detail below with respect to FIG. 13 , cloud server102 and each of the cloud based file hosting systems 140 may compriseany suitable computing device, such as a stationary computing device(e.g., a desktop computer or personal computer), a mobile computingdevice (e.g., a Microsoft® Surface® device, a personal digital assistant(PDA), a laptop computer, a notebook computer, a tablet computer such asan Apple iPad™, a netbook, etc.), a mobile phone (e.g., a cell phone, asmart phone such as an Apple iPhone, a phone implementing the Google®Android™ operating system, a dual screen phone; a Microsoft® Windowsphone, etc.), a wearable computing device (e.g., a head-mounted deviceincluding smart glasses such as Google® Glass™ Oculus Rift® by OculusVR, LLC, etc.), a gaming console/system (e.g., Nintendo Switch®, etc.),an appliance, a set top box, etc.

Cloud server 102 may comprise a plurality of VM nodes that are similaror substantially the same as VM node 110, and each of the nodes may bereferred to as VM node 110. VM node 110 may be communicatively coupledto, and may provide file and folder scanning services to, one or more ofthe cloud based file hosting systems 140. For example, VM node 110 maybe configured to track activities performed on files and or folders thatare stored in file hosting systems 140 (e.g., sharing, editing,creating, deleting, or adding a collaborator to a file or a folder).Such an activity (i.e., a folder event) may trigger a status update onthe folder that underwent the activity and its direct and indirectchildren (i.e., direct or indirect sub-folders and sub-files). In thisregard, VM node 110 may be configured to receive file and folder eventsfrom one or more cloud based file hosting systems 140, for example, theevents may be received in response to a request for events that issubmitted via an API call to an API service 142. The request for eventsmay be limited to events that occurred during specified time frame(e.g., the last 30 minutes). Moreover, VM node 110 may be configured tofetch file and/or folder metadata utilizing API calls to one or more APIservices 142 of the cloud based file hosting systems 140.

First persistent queue 120 may be configured to store raw file and/orfolder event information that may be received from a cloud based filehosting system 140. For example, first persistent queue 120 may storework items that include enough information about a file or a folder(e.g., a file ID or folder ID) to perform a scan in a cloud based filehosting system 140 for metadata associated with the file, folder, and/orchildren of the folder. The work items of first persistent queue 120 maybe updated with folder ID information of folder children that isreceived from a cloud based file hosting system 140 when the metadata isretrieved for the folder children. First persistent queue 120 may bereferred to as ID queue 120.

Second persistent queue 130 may be configured to store enriched fileand/or folder event information comprising file and/or folder metadata.The metadata may be retrieved from cloud based file hosting system 140during a scan, for work items stored in the first persistent queue 120.The second persistent queue 130 may be referred to as metadata queue130.

Virtual machine node 110 may operate in various ways to perform itsfunctions. For instance, FIG. 2 is a flowchart 200 of a method in acloud server for performing a breadth first search for file and foldermetadata in response to a folder event, and enqueuing the metadata in apersistent queue, according to an example embodiment. In an embodiment,VM node 110 may be configured to operate according to flowchart 200.Flowchart 200 is described as follows with reference to FIGS. 1 and 3 .

FIG. 3 is a block diagram of a system 300 with a VM node configured toperform a breadth first search for file and folder metadata in responseto a folder event, and enqueue the metadata in a persistent queue,according to an example embodiment. As shown in FIG. 3 , system 300comprises VM node 110. VM node 110 comprises first persistent queue 120,second persistent queue 130, a queue manager 340, an API manager 350,and scan limit parameters 360. First persistent queue 120 may comprise afirst work item 320 that includes a first folder ID 322 or an internalqueue 324. Internal queue 324 comprises one or more child folder IDs326. Second persistent queue 130 may comprise a second work item 330comprising first metadata 332 and a third work item 334 comprisingsecond metadata 336. In some embodiments system 300 may be implementedin system 100. For purposes of illustration, system 300 is described indetail as follows with respect to flowchart 200 of FIG. 2 .

Flowchart 200 begins with step 202. In step 202, a folder eventcomprising at least a folder ID of a first folder is received from acloud-based file system. For example, an action may be performed on afolder that is stored in cloud based file hosting system 140. The folder(i.e., the first folder) may be associated with first folder ID 322. Inresponse to the action, cloud based file hosting system 140 may issue afolder event to cloud server 102 comprising first folder ID 322. In someembodiments, API manager 350 may transmit a request for events via anAPI call to API service 142. Queue manager 340 may be configured toreceive the event and create first work item 320 comprising first folderID 322. Queue manager 340 may be further configured to enqueue firstwork item 320 in first persistent queue 120 (i.e., ID queue 120). Firstfolder ID 322 may be utilized to scan cloud based file hosting system140 for metadata of the first folder (associated with first folder ID322) and metadata of any immediate children of the first folder (e.g.,children comprising sub-folders and/or sub-files of the first folder).

In step 204, a first work item associated with the folder event may bedequeued from a first persistent queue. For example, queue manager 340may be configured to dequeue first work item 320 from first persistentqueue 120 and provide first folder ID 322 to API manager 350.

In step 206, metadata of the first folder and of the immediate childrenof the first folder may be fetched via an API service. For example, APImanager 350 may be configured to transmit a request comprising firstfolder ID 322 to API services 142 in cloud based file hosting system140, and request metadata associated with the first folder identified byfirst folder ID 322. API services 142 may be configured to transmitfirst metadata 332 associated with the first folder and a respectivefirst metadata 332 associated with each of immediate children of thefirst folder, including sub-files and/or sub folders, if any, to APImanager 350.

In step 208, for the first folder and each immediate child of the firstfolder, enqueue respective metadata in a second persistent queue. Forexample, queue manager 340 is configured to create a respective secondwork item 330 for each of the first folder and each of the immediatechildren of the first folder, and insert respective first metadata 332of each of the first folder and each of the immediate children of thefirst folder into each corresponding second work item 330. Queue manager340 is further configured to enqueue each second work item 330comprising respective first metadata 332 into second persistent queue130.

In step 210, in instances when there are additional children descendedfrom the first folder to scan, the steps proceed to step 212. Forexample, if queue manager 340 determines that any of the immediatechildren of the first folder includes a folder that may have sub-foldersand/or sub-files, and a limit has not been reached, the steps proceed tostep 212. Otherwise, flowchart 200 ends at step 218.

As described in more detail below, the depth of scanning generations ofchildren of the first folder, or the number of files scanned duringflowchart 200 may be limited according to configured settings, in orderto control the amount of resources consumed by flowchart 200.

In step 212, second metadata is fetched via the API service. Forexample, API manager 350 may be configured to transmit a metadatarequest, for one folder at a time, until a limit is met, to API services142 of cloud based file hosting system 140. Each metadata request mayindicate a child folder ID 326 that is associated with a folder typechild of the immediate children of the first folder. API services 142may be configured to respond to each request by transmitting therequested second metadata 336. The second metadata 336 may each beassociated with sub-files and/or sub folders, if any, of the folder typechildren of the immediate children of the first folder.

In step 214, respective second fetched metadata is enqueued in thesecond persistent queue. For example, queue manager 340 is configured tocreate a third work item 334 for respective second metadata of eachchild of each folder of the immediate children of the first folder forwhich metadata was retrieved, and insert each respective second metadata336 into a third work item 334. Queue manager 340 is further configuredto enqueue each third work item 334 into second persistent queue 130.

In step 216, as described in more detail below, fetching and enqueuingof metadata may be repeated for additional children of the first folder.For example, as long as additional generations of child folders of thefirst folder are available to scan, and a limit for scanning has notbeen met, API manager 350 may be configured to fetch metadata of filesand folders, create a metadata work item for each file and folder,enqueue the metadata work item(s) in second persistent queue 130, andupdate the first work item 320 with additional child folder IDs 326.

As a result of performing flowchart 200, the second persistent queue 130may comprise a work item for each sub-file and each sub-folder for oneor more generations of children of the first folder. Each of the workmetadata items may comprise metadata associated with its respectivesub-file or sub-folder. In some embodiments, the work items of thesecond persistent queue 130 may be dequeued and fed into a pipeline thatmay utilize the metadata for various functions. For example, theactivity performed on the first folder and/or one or more generations ofchildren of the first folder may be analyzed and compared to policies tofind policy violations.

In some embodiments, rather than receiving a folder event for activitiesdetected in a first folder that is stored in cloud based file hostingsystem 140 (as described in step 202), VM node 110 may receive a firstfile event for activities detected with respect to a file stored in thecloud based file hosting system 140. Since the first file does not haveany children, the method may have a single iteration of fetchingmetadata from cloud based file hosting system 140. For example, queuemanager 340 may be configured to create a first work item comprising afirst file ID for the first file and enqueue the first work item infirst persistent queue 120. Queue manager 340 may dequeue the first workitem from first persistent queue 120, and API manager 350 may requestmetadata of the first file based on the first file ID in the first workitem. API services 142 may access the requested metadata based on thefirst file ID, and transmit the requested metadata to API manager 350.Queue manager 340 may be configured to enqueue the metadata of the firstfile into a work item in second persistent queue 130. No furtherrepetitions are required since the first file does not have children.

VM node 110 may operate in various ways to perform its functions. Forinstance, FIG. 4 is a flowchart 400 of a method in a VM node forperforming a breadth first search for file and folder metadata in a filehosting system, and enqueuing the metadata in a persistent queue,according to an example embodiment. In an embodiment, VM node 110 mayoperate according to flowchart 400. Flowchart 400 is described asfollows with reference to FIGS. 1 and 3 .

Flowchart 400 begins with step 402. Step 402 is a start step for a firstpass in flowchart 400. For example, aspects of one or more steps in thefirst pass may be repeated in one or more additional passes.

In step 404, a first work item that is associated with a folder event isdequeued from a first persistent queue. For example, a first folder (orroot folder) is stored in cloud based file hosting system 140. The firstfolder may have immediate children that may comprise one or more foldersand/or one or more files. Each immediate child that is a folder, maycomprise another generation of children of the first folder (e.g.,grandchildren of the first folder), which may include one or more filesand/or one or more folders, and so on for any additional generations offiles and/or folders descended from the first folder. First work item320 may be associated with a folder event that occurred relative to thefirst folder, where an activity associated with the first folder mayhave triggered the folder event. First work item 320 may comprise firstfolder ID 322 that may be associated with the first folder and wasreceived in the first folder event. Queue manager 340 may dequeue firstwork item 320 from first persistent queue 120.

In step 406, first metadata of the first folder and of immediatechildren of the first folder are fetched, where the immediate childrencomprise at least one of a second folder or a file. For example, APImanager 350 may be configured to access API services 142 of cloud basedfile hosting system 140 and utilize first folder ID 322 to requestmetadata that is associated with the first folder and metadataassociated with each of the immediate children of the first folder. Theimmediate children may comprise sub-files and/or sub-folders relative tothe first folder. Each of the immediate children of the first folderthat is a folder may be referred to as a second folder. API manager 350may be configured to receive the requested metadata from API services142.

In step 408, a second work item comprising the fetched first metadatamay be created. For example, for the first folder, and for each of theimmediate children of the first folder, queue manager 340 may beconfigured to create a respective second work item 330. Each of therespective second work items 330 may include fetched metadata shown asfirst meta data 332 that corresponds to the respective first folder or arespective immediate child of the first folder.

In step 410, the second work item may be enqueued to a second persistentqueue. For example, queue manager 340 may enqueue each respective secondwork item 330 to second persistent queue 130. Although there are threesecond work items 330 shown second persistent queue 130, there may bemore or fewer second work items 330 in second persistent queue 130depending on the number of immediate children of the first folder.

In step 412, in response to determining that the first pass did notcomplete scanning children of the first folder, the first work item maybe updated for each second folder of the immediate children of the firstfolder by adding a folder identifier (ID) into an internal queue of thefirst work item. For example, if any of the immediate children of thefirst folder is a folder (i.e., a second folder), there may be anothergeneration of children within the second folder(s) to scan for metadata.Queue manager 340 may be configured to determine whether there areadditional children of the first folder to be scanned, and if so, mayupdate first work item 320 by inserting a respective child folder ID326, for each of the immediate children of the first folder that is asecond folder, into internal queue 324 of work item 320. Each of therespective child folder IDs 326 may be utilized to access metadataassociated with children of the corresponding second folder in cloudbased file hosting system 140. Although there are three child folder IDs326 shown in internal queue 324, there may be more or fewer than threechild folder IDs 326 depending on the number of immediate children ofthe first folder that are second folders.

In step 414, the updated first work item may be inserted into the firstpersistent queue. For example, when there are more children to scan formetadata in cloud based file hosting system 140, queue manager 340 maybe configured to enqueue the updated first work item 320 comprisinginternal queue 324 to first persistent queue 120.

Step 416 is a step for initiating a second pass in flowchart 400. Forexample, aspects of one or more steps in the second pass may be repeatedin one or more subsequent passes.

In step 418, a child folder ID of the internal queue for a second folderof the immediate children may be dequeued from the first persistentqueue. For example, as described above, each of the immediate childrenof the first folder that is a folder may be referred to as a secondfolder. Each of the child folder IDs 326 of internal queue may beassociated with a second folder. Queue manager 340 may be configured todequeue a child folder ID 326 from internal queue 324 of first work item320 in first persistent queue 120.

In step 420, second metadata of immediate children of the second folderof the immediate children of the first folder may be fetched based onthe child folder ID. For example, API manager 350 may be configured toaccess API services 142 of cloud based file hosting system 140, and foreach second folder (e.g., one second folder at a time until a scanninglimit may be met), utilize a corresponding child folder ID 326 torequest second metadata for each immediate child of the second folder(e.g., the children of a second folder are grandchildren of the firstfolder, and second metadata is fetched for each grandchild of the firstfolder until a limit may be met). The immediate children of the secondfolders may comprise one or more folders and/or one or more files. APImanager 350 may be configured to receive the requested second metadatafrom API services 142.

In step 422, a third work item comprising the fetched second metadatamay be created. For example, queue manager 340 may be configured tocreate a third work item for each received second metadata associatedwith a respective immediate child of each second folder.

In step 424, the third work item is enqueued to the second persistentqueue. For example, queue manager 340 may be configured to enqueue eachof the third work items comprising second metadata of a respectiveimmediate child of each second folder to the second persistent queue130.

API server 350 and queue manager 340 may operate in various ways toperform their functions. For instance, FIG. 5 is a flowchart 500 of amethod for fetching metadata of a folder and immediate children of thefolder, according to an example embodiment. Flowchart 500 may beperformed as part of flowchart 400 (FIG. 4 ), such as during step 406.In an embodiment, API manager 350 and/or queue manager 340 may operateaccording to flowchart 500. Flowchart 500 is described as follows withreference to FIGS. 1 and 3 .

Flowchart 500 begins with step 502. In step 502, the first metadata ofthe first folder and of immediate children of the first folder isfetched via a cloud service based on a first folder ID of the firstfolder. For example, as described above, API manager 350 may beconfigured to access API services 142, and utilizing first folder ID322, retrieve metadata for the first folder and the immediate childrenof the first folder including one or more files and/or one or morefolders.

In step 504, alternatively or in addition, the first metadata of thefirst folder and of immediate children of the first folder is fetchedfrom the first work item. For example, in some embodiments, one or moreof the cloud based file hosting systems 140 may be configured to sendthe first metadata for the first folder and/or for one or more of theimmediate children of the first folder within the folder event receivedfor the first folder (e.g., an event resulting from an activityassociated with the first folder in cloud based file hosting system140). In such an embodiment, the first metadata may be included in firstwork item 320 or may be stored in memory. Queue manager 340 may beconfigured to dequeue first work item 320 and retrieve the firstmetadata from first work item 320 or from memory.

API server 350 and queue manager 340 may operate in various ways toperform their functions. For instance, FIG. 6 is a flowchart 600 of amethod for fetching metadata in multiple sets of the metadata, accordingto an example embodiment. Flowchart 600 may be performed as part offlowchart 400 (FIG. 4 ), such as during step 402. In an embodiment, APIserver 350 and queue manager 340 may operate according to flowchart 600.Flowchart 600 is described as follows with reference to FIGS. 1 and 3 .

Flowchart 600 begins with step 602. In step 602, the fetched firstmetadata comprises a first set of metadata of the first folder and theimmediate children of the first folder. For example, in someembodiments, API services 142 of cloud based file hosting system 140 maybe configured to transmit first metadata associated with the firstfolder and the immediate children of the first folder in one or moresets to API manager 350, each set comprising a portion of the firstmetadata. The sets may be transmitted via multiple separatetransmissions (e.g., to manage usage of scanning resources and reducelatency in the system). Similarly, portions of second metadata may betransmitted by API services 142 to API manager 350 in multiple sets. Insome embodiments, a set of metadata may be transmitted as a page ofmetadata and the amount of metadata allowed per page may be limited. Insome embodiments, the limit may be specified and/or controlled by cloudbased file hosting system 140, or by cloud server 102. In this manner,system 300 may honor throttling mechanism policies of different APIservice 140 providers. In some embodiments, scan limit parameters 360may comprise a parameter for the number of metadata elements that may beincluded in the transmission of a page.

In step 604, the fetched first metadata includes a token for accessing asecond set of metadata of the first folder and the immediate children ofthe first folder. For example, when API services 142 transmits firstmetadata in one or more sets of first metadata, API services may beconfigured to include a token with each set having a subsequent set tofollow in another transmission. API manager 350 may be configured toinclude the token in a request for a next set of first metadate to fetchthe next set. Similarly, additional fetched metadata, such as fetchedsecond metadata, may be received in sets with tokens for any subsequentsets to be transmitted separately.

VM node 110 may operate in various ways to perform its functions. Forinstance, FIG. 7 is a flowchart 700 comprising a step for enqueueingmetadata of children of a first folder associated with a folder event,according to an example embodiment. VM node 110 may operate according toflowchart 700. Flowchart 700 is described as follows with reference toFIGS. 1 and 3 .

Flowchart 700 includes step 702. In step 702, the cloud server comprisesa virtual machine node of a plurality of virtual machine nodes thatexecutes the method for enqueueing metadata of children of the firstfolder associated with the folder event. For example, cloud server 102may comprise a plurality of VM nodes such as VM node 110, where each ofthe plurality of VM nodes may be configured to execute the steps of allor a portion of the flow charts 200, 400, 500, 600, 700, 800, 900, 1000,and/or 1100.

API manager 350 may operate in various ways to perform its functions.For instance, FIG. 8 is a flowchart 800 of a method for receiving afolder event from a cloud based file hosting system, according to anexample embodiment. Flowchart 800 may be performed as part of flowchart400 (FIG. 4 ), such as before step 402. In an embodiment, API manager350 may operate according to flowchart 800. Flowchart 800 is describedas follows with reference to FIGS. 1 and 3 .

Flowchart 800 begins with step 802. In step 802, prior to said dequeuingof said first pass: the folder event comprising at least a first folderID of the first folder is received from a cloud based file hostingsystem. For example, prior to dequeuing first work item 320 from firstpersistent queue 120, where first work item 320 is associated with thefirst folder and the first folder event, the first folder event isreceived from cloud based file hosting system 140 via API services 142.The folder event may comprise first folder ID 322, which is associatedwith the first folder.

In step 804, the folder event is triggered in the cloud based filehosting system based on at least one of: editing a file or folder,creating a file or folder, removing a file or folder, adding acollaborator to a file or folder, or sharing a file or a folder. Forexample, the folder event may be triggered in cloud based file hostingsystem 140 based on: editing a file or folder, creating a file orfolder, removing a file or folder, adding a collaborator to a file orfolder, or sharing a file or a folder, where the file or folder maycomprise a first file or the first folder.

Second persistent queue 130 may operate in various ways to perform itsfunctions. For instance, FIG. 9 is a flowchart 900 of a method forqueueing metadata of a folder or a file, according to an exampleembodiment. Flowchart 900 may be performed as part of flowchart 400(FIG. 4 ), such as after step 404 and/or after step 418. In anembodiment, second persistent queue 130 may operate according toflowchart 900. Flowchart 900 is described as follows with reference toFIGS. 1 and 3 .

Flowchart 900 includes step 902. In step 902, at least one of themetadata of the first folder, the metadata of the immediate children ofthe first folder, or the metadata of the immediate children of theimmediate children of the first folder comprises: a size of a folder orfile, a name of a folder or file, an owner of a folder or file, anindication of user collaborators of a folder or file, permissions of afolder or file, a formatting type of a folder or file, a hash of afolder or file, a creator of a folder or file, a date of creation of afolder or file, or a modification date of a folder or file.

API server 350 and queue manager 340 may operate in various ways toperform their functions. For instance, FIG. 10 is a flowchart 1000 of amethod for repeating steps for enqueuing metadata associated with afolder event, according to an example embodiment. Flowchart 1000 may beperformed as part of flowchart 400 (FIG. 4 ), such as after step 416. Inan embodiment, API server 350 and queue manager 340 may operateaccording to flowchart 1000. Flowchart 1000 is described as follows withreference to FIGS. 1 and 3 .

Flowchart 1000 includes step 1002. In step 1002, for each remainingchild folder ID (if any) of the internal queue for a second folder ofthe immediate children, repeating said dequeuing, said fetching, saidcreating, and said enqueuing of said second pass. For example, asdescribed above, internal queue 324 of work item 320 in first persistentqueue 120, may comprise one or more child folder IDs 326 that eachcorrespond to one of the second folders (i.e., immediate folder childrenof the first folder). Although there are three child folder IDs 326shown in internal queue 324, there may be more or fewer child folder IDs326 in the queue depending on how many of the immediate children of thefirst folder are folders (i.e., second folders). If after the secondpass (described with respect to flow chart 400 in step 416), there areany child folder IDs 326 remaining in internal queue 324 (e.g., and ascanning limit has not been met), then steps 418, 420, 422, and 424 offlow chart 400 may be repeated for each remaining child folder ID 326remaining in internal queue 324. If none of the children of the firstfolder are folders, then queue manager 340 may not have created firstwork item 320 with internal queue 324.

Queue manager 340 and API manager 350 may operate in various ways toperform their functions. For instance, FIG. 11 is a flowchart 1100 of amethod for repeating steps for enqueuing metadata associated with afolder event for each generation of children of a first folder or untila specified limit is met, according to an example embodiment. Flowchart1100 may be performed as part of flowchart 400 (FIG. 4 ), such as afterstep 410. In an embodiment, queue manager 340 and API manager 350 mayoperate according to flowchart 1100. Flowchart 1100 is described asfollows with reference to FIGS. 1 and 3 .

Flowchart 1100 includes step 1102. In step 1102, said updating the firstwork item and said enqueueing the updated first work item of said firstpass, and said dequeuing, said fetching, said creating, and saidenqueuing of said second pass is repeated for each generation ofchildren of the first folder or until a specified limit is met. Forexample, queue manager 340 and API manager 350 may perform steps 412,414, 416, 418, 420, 422, and 424 for each folder that is an immediatechild of a folder in any generation of children under the first folder.In some embodiments, scan limit parameters 360 may be configured in VMnode 110. Scan limit parameters 360 may specify limits or criteria forending the scanning process for fetching metadata of the first folderand children of the first folder. For example, the scan limit parametersmay indicate when to discontinue scanning for metadata in order toconserve resources or avoid latency in VM node 110, cloud based filehosting system 140, or the communication channels between them. In someembodiments, the scanning for metadata may continue beyond a configuredlimit, to complete the scanning of files in a folder. For example, atthe end of scanning all of the files in a folder, the number of filesscanned may be compared to a scan limit parameter 360, and if the limithas been reached, the scanning may stop.

Queue manager 340 and API manager 350 may operate in various ways toperform their functions. For instance, FIG. 12 is a flowchart 1200 of amethod for limiting file and folder scans for fetching metadata,according to an example embodiment. Flowchart 1200 may be performed aspart of flowchart 400 (FIG. 4 ), such as after step 404. In anembodiment, queue manager 340 and API manager 350 may operate accordingto flowchart 1200. Flowchart 1200 is described as follows with referenceto FIGS. 1 and 3 .

Flowchart 1200 includes step 1202. In step 1202, the specified limit isbased on: a number of files scanned for metadata, or a depth ofgenerations of children of the first folder. For example, scan limitparameters 360 may be based on the depth of a scan (e.g., the number ofgenerations of children of the first folder to scan for metadata incloud based file hosting system 140). Alternatively or in addition, scanlimit parameters 360 may be based on the number of the number of filesor folders that are allowed to be scanned for metadata in cloud basedfile hosting system 140, for example, in response to the folder eventfor the first folder received from cloud based file hosting system 140.

In general, the following limitations may be set in systems 100 and 300:(1) an upper bound for the total number of scanned sub-files andsub-folders of a first folder, (2) limit of depth (from the firstfolder) to scan, (3) dynamically update the upper bound to allowcompleting a full-scan of a sub-folder (e.g., avoiding a partial scan,if possible), (4) breaking a folder scan into iterations of smallerscans (e.g., pages of metadata), each of them may be limited to a fixednumber of API calls to API services 142.

Several technical improvements may be achieved by systems 100 and 300.For example, by enforcing the above described limitations on the processof enqueuing metadata into second persistent queue 130 (i.e., metadataqueue), VM nodes 110 of cloud server 102 may be configured to handlethousands of events per second with low latency. Also, VM Nodes 110 maybe configured to handle events from the same drive of a cloud based filehosting system 140 in parallel. Furthermore, resources may be conservedby splitting the bandwidth (e.g., API calls per second) evenly amongdifferent events to avoid starvation.

The methods described herein may be generic to most software as aservice systems that support a file hosting service. Moreover, system100 and 300 may honor throttling mechanism policies of different APIservices providers.

III. Example Computer System Implementation

Embodiments described herein may be implemented in hardware, or hardwarecombined with software and/or firmware. For example, embodimentsdescribed herein may be implemented as computer programcode/instructions configured to be executed in one or more processorsand stored in a computer readable storage medium. Alternatively,embodiments described herein may be implemented as hardwarelogic/electrical circuitry.

As noted herein, the embodiments described, including but not limitedto, systems 100 and 300 along with any components and/or subcomponentsthereof, as well any operations and portions of flowcharts/flow diagramsdescribed herein and/or further examples described herein, may beimplemented in hardware, or hardware with any combination of softwareand/or firmware, including being implemented as computer program codeconfigured to be executed in one or more processors and stored in acomputer readable storage medium, or being implemented as hardwarelogic/electrical circuitry, such as being implemented together in asystem-on-chip (SoC), a field programmable gate array (FPGA), anapplication specific integrated circuit (ASIC), a trusted platformmodule (TPM), and/or the like. A SoC may include an integrated circuitchip that includes one or more of a processor (e.g., a microcontroller,microprocessor, digital signal processor (DSP), etc.), memory, one ormore communication interfaces, and/or further circuits and/or embeddedfirmware to perform its functions.

Embodiments described herein may be implemented in one or more computingdevices similar to a mobile system and/or a computing device instationary or mobile computer embodiments, including one or morefeatures of mobile systems and/or computing devices described herein, aswell as alternative features. The descriptions of computing devicesprovided herein are provided for purposes of illustration, and are notintended to be limiting. Embodiments may be implemented in further typesof computer systems, as would be known to persons skilled in therelevant art(s).

FIG. 13 is a block diagram of an example processor-based computer system1300 that may be used to implement various embodiments. Cloud server102, virtual machine nodes 110, and cloud based file hosting systems 140may include any type of computing device, mobile or stationary, such asa desktop computer, a server, a video game console, etc. For example,cloud server 102, virtual machine nodes 110, and cloud based filehosting systems 140 may be any type of mobile computing device (e.g., aMicrosoft® Surface® device, a personal digital assistant (PDA), a laptopcomputer, a notebook computer, a tablet computer such as an Apple iPad™,a netbook, etc.), a mobile phone (e.g., a cell phone, a smart phone suchas a Microsoft Windows® phone, an Apple iPhone, a phone implementing theGoogle® Android™ operating system, etc.), a wearable computing device(e.g., a head-mounted device including smart glasses such as Google®Glass™, Oculus Rift® by Oculus VR, LLC, etc.), a stationary computingdevice such as a desktop computer or PC (personal computer), a gamingconsole/system (e.g., Microsoft Xbox®, Sony PlayStation®, Nintendo Wii®or Switch®, etc.), etc.

Cloud server 102, virtual machine nodes 110, and cloud based filehosting systems 140, may each be implemented in one or more computingdevices containing features similar to those of computing device 1300 instationary or mobile computer embodiments and/or alternative features.The description of computing device 1300 provided herein is provided forpurposes of illustration, and is not intended to be limiting.Embodiments may be implemented in further types of computer systems, aswould be known to persons skilled in the relevant art(s).

As shown in FIG. 13 , computing device 1300 includes one or moreprocessors, referred to as processor circuit 1302, a system memory 1304,and a bus 1306 that couples various system components including systemmemory 1304 to processor circuit 1302. Processor circuit 1302 is anelectrical and/or optical circuit implemented in one or more physicalhardware electrical circuit device elements and/or integrated circuitdevices (semiconductor material chips or dies) as a central processingunit (CPU), a microcontroller, a microprocessor, and/or other physicalhardware processor circuit. Processor circuit 1302 may execute programcode stored in a computer readable medium, such as program code ofoperating system 1330, application programs 1332, other programs 1334,etc. Bus 1306 represents one or more of any of several types of busstructures, including a memory bus or memory controller, a peripheralbus, an accelerated graphics port, and a processor or local bus usingany of a variety of bus architectures. System memory 1304 includes readonly memory (ROM) 1308 and random-access memory (RAM) 1310. A basicinput/output system 1312 (BIOS) is stored in ROM 1308.

Computing device 1300 also has one or more of the following drives: ahard disk drive 1314 for reading from and writing to a hard disk, amagnetic disk drive 1316 for reading from or writing to a removablemagnetic disk 1318, and an optical disk drive 1320 for reading from orwriting to a removable optical disk 1322 such as a CD ROM, DVD ROM, orother optical media. Hard disk drive 1314, magnetic disk drive 1316, andoptical disk drive 1320 are connected to bus 1306 by a hard disk driveinterface 1324, a magnetic disk drive interface 1326, and an opticaldrive interface 1328, respectively. The drives and their associatedcomputer-readable media provide nonvolatile storage of computer-readableinstructions, data structures, program modules and other data for thecomputer. Although a hard disk, a removable magnetic disk and aremovable optical disk are described, other types of hardware-basedcomputer-readable storage media can be used to store data, such as flashmemory cards, digital video disks, RAMs, ROMs, and other hardwarestorage media.

A number of program modules may be stored on the hard disk, magneticdisk, optical disk, ROM, or RAM. These programs include operating system1330, one or more application programs 1332, other programs 1334, andprogram data 1336. Application programs 1332 or other programs 1334 mayinclude, for example, computer program logic (e.g., computer programcode or instructions) for implementing cloud server 102, virtual machinenodes 110, cloud based file hosting systems 140, first persistent queue120, second persistent queue 130, API services 142, queue manager 340,API manager 350, internal queue 324, any one or more of flowcharts 200,400, 500, 600, 700, 800, 900, 1000, 1100, 1200 (including any stepthereof), and/or further embodiments described herein. Program data 1336may include, first work item 320, first folder ID 322, updated firstwork item 320, internal queue 324, child folder IDs 326, second workitems 330, first metadata 332, third work items 334, second metadata336, and scan limit parameters 360, and/or further embodiments describedherein.

A user may enter commands and information into computing device 1300through input devices such as keyboard 1338 and pointing device 1340.Other input devices (not shown) may include a microphone, joystick, gamepad, satellite dish, scanner, a touch screen and/or touch pad, a voicerecognition system to receive voice input, a gesture recognition systemto receive gesture input, or the like. These and other input devices areoften connected to processor circuit 1302 through a serial portinterface 1342 that is coupled to bus 1306, but may be connected byother interfaces, such as a parallel port, game port, or a universalserial bus (USB).

A display screen 1344 is also connected to bus 1306 via an interface,such as a video adapter 1346. Display screen 1344 may be external to, orincorporated in computing device 1300. Display screen 1344 may displayinformation, as well as being a user interface for receiving usercommands and/or other information (e.g., by touch, finger gestures,virtual keyboard, etc.). In addition to display screen 1344, computingdevice 1300 may include other peripheral output devices (not shown) suchas speakers and printers.

Computing device 1300 is connected to a network 1348 (e.g., theInternet) through an adaptor or network interface 1350, a modem 1352, orother means for establishing communications over the network. Modem1352, which may be internal or external, may be connected to bus 1306via serial port interface 1342, as shown in FIG. 13 , or may beconnected to bus 1306 using another interface type, including a parallelinterface.

As used herein, the terms “computer program medium,” “computer-readablemedium,” and “computer-readable storage medium” are used to refer tophysical hardware media such as the hard disk associated with hard diskdrive 1314, removable magnetic disk 1318, removable optical disk 1322,other physical hardware media such as RAMs, ROMs, flash memory cards,digital video disks, zip disks, MEMs, nanotechnology-based storagedevices, and further types of physical/tangible hardware storage media.Such computer-readable storage media are distinguished from andnon-overlapping with communication media (do not include communicationmedia). Communication media embodies computer-readable instructions,data structures, program modules or other data in a modulated datasignal such as a carrier wave. The term “modulated data signal” means asignal that has one or more of its characteristics set or changed insuch a manner as to encode information in the signal. By way of example,and not limitation, communication media includes wireless media such asacoustic, RF, infrared and other wireless media, as well as wired media.Embodiments are also directed to such communication media that areseparate and non-overlapping with embodiments directed tocomputer-readable storage media.

As noted above, computer programs and modules (including applicationprograms 1332 and other programs 1334) may be stored on the hard disk,magnetic disk, optical disk, ROM, RAM, or other hardware storage medium.Such computer programs may also be received via network interface 1350,serial port interface 1342, or any other interface type. Such computerprograms, when executed or loaded by an application, enable computingdevice 1300 to implement features of embodiments discussed herein.Accordingly, such computer programs represent controllers of computingdevice 1300.

Embodiments are also directed to computer program products comprisingcomputer code or instructions stored on any computer-readable medium.Such computer program products include hard disk drives, optical diskdrives, memory device packages, portable memory sticks, memory cards,and other types of physical storage hardware.

IV. Additional Examples and Advantages

In an embodiment, a system in a cloud server for enqueueing metadata ofchildren of a first folder associated with a folder event comprises oneor more processors, and one or more memory devices that store programcode to be executed by the one or more processors. The program codecomprises a queue manager that is configured to, in a first pass,dequeue, from a first persistent queue, a first work item associatedwith the folder event. An API manager is configured to, in the firstpass, fetch first metadata of the first folder and of immediate childrenof the first folder, where the immediate children comprise at least oneof a folder or a file. The queue manager is further configured to, inthe first pass: create a second work item comprising the fetched firstmetadata, enqueue the second work item to a second persistent queue,update the first work item for each folder of the immediate children byadding a child folder identifier (ID) into an internal queue of thefirst work item, and insert the updated first work item into the firstpersistent queue. In a second pass, the queue manager is furtherconfigured to dequeue, from the first persistent queue, a child folderID of the internal queue for a second folder of the immediate children.The API manager is further configured to, in the second pass, fetchsecond metadata of immediate children of the second folder of theimmediate children of the first folder based on the child folder ID. Thequeue manager is further configured to, in the second pass: create athird work item comprising the fetched second metadata and enqueue thethird work item to the second persistent queue.

In an embodiment of the foregoing system, the first metadata of thefirst folder and of immediate children of the first folder is fetchedvia a cloud service based on a first folder ID of the first folder, orfrom the first work item.

In an embodiment of the foregoing system, the fetched first metadatacomprises a first set of metadata of the first folder and the immediatechildren of the first folder, and the fetched first metadata includes atoken for accessing a second set of metadata of the first folder and theimmediate children of the first folder.

In an embodiment of the foregoing system, the cloud server comprises avirtual machine node of a plurality of virtual machine nodes thatexecutes the method for enqueueing metadata of children of the firstfolder associated with the folder event.

In an embodiment of the foregoing system the queue manager is furtherconfigured to, prior to said dequeuing of said first pass, receive, froma cloud-based file system, the folder event comprising at least a firstfolder ID of the first folder, where the folder event is triggered inthe cloud-based file system based on at least one of: editing a file orfolder, creating a file of folder, removing a file or folder, adding acollaborator to a file or folder, or sharing a file or a folder.

In an embodiment of the foregoing system, at least one of the metadataof the first folder, the metadata of the immediate children of the firstfolder, or the metadata of the immediate children of the immediatechildren of the first folder comprises: a size of a folder or file, afile name of a folder or file, an owner of a folder or file, anindication of user collaborators of a folder or file, permissions of afolder or file, a formatting type of a folder or file, a hash of afolder or file, a creator of a folder or file, a date of creation of afolder or file, or a modification date of a folder or file.

In an embodiment of the foregoing system, the queue manager and the APImanager are further configured to: repeat, for each remaining childfolder ID, if any, of the internal queue for a folder of the immediatechildren, said dequeuing, said fetching, said creating, and saidenqueuing of said second pass.

In an embodiment of the foregoing system, the queue manager and the APImanager are further configured to repeat said dequeuing, said fetching,said creating, and said enqueuing of said second pass for eachgeneration of children of the first folder or until a specified limit ismet.

In an embodiment of the foregoing system, the specified limit is basedon a number of files scanned for metadata or a depth of generations ofchildren of the first folder.

In an embodiment, a method in a cloud server for enqueueing metadata ofchildren of a first folder associated with a folder event includes, in afirst pass: dequeuing, from a first persistent queue, a first work itemassociated with the folder event, fetching first metadata of the firstfolder and of immediate children of the first folder, where theimmediate children comprise at least one of a folder or a file, creatinga second work item comprising the fetched first metadata, enqueuing thesecond work item to a second persistent queue, updating the first workitem for each folder of the immediate children by adding a child folderidentifier (ID) into an internal queue of the first work item, andinserting the updated first work item into the first persistent queue.In a second pass, the method further includes dequeuing, from the firstpersistent queue, a child folder ID of the internal queue for a secondfolder of the immediate children, fetching second metadata of immediatechildren of the second folder of the immediate children of the firstfolder based on the child folder ID, creating a third work itemcomprising the fetched second metadata, and enqueuing the third workitem to the second persistent queue.

In an embodiment of the foregoing method, the first metadata of thefirst folder and of immediate children of the first folder is fetchedvia a cloud service based on a first folder ID of the first folder orfrom the first work item.

In an embodiment of the foregoing method, the fetched first metadatacomprises a first set of metadata of the first folder and the immediatechildren of the first folder, and the fetched first metadata includes atoken for accessing a second set of metadata of the first folder and theimmediate children of the first folder.

In an embodiment of the foregoing method, the cloud server comprises avirtual machine node of a plurality of virtual machine nodes thatexecutes the method for enqueueing metadata of children of the firstfolder associated with the folder event.

In an embodiment of the foregoing method, prior to said dequeuing ofsaid first pass, the method further comprises receiving, from acloud-based file system, the folder event comprising at least a firstfolder ID of the first folder, where the folder event is triggered inthe cloud-based file system based on at least one of: editing a file orfolder, creating a file of folder, removing a file or folder, adding acollaborator to a file or folder, or sharing a file or a folder.

In an embodiment of the foregoing method, at least one of the metadataof the first folder, the metadata of the immediate children of the firstfolder, or the metadata of the immediate children of the immediatechildren of the first folder comprises: a size of a folder or file, afile name of a folder or file, an owner of a folder or file, anindication of user collaborators of a folder or file, permissions of afolder or file, a formatting type of a folder or file, a hash of afolder or file, a creator of a folder or file, a date of creation of afolder or file, or a modification date of a folder or file.

In an embodiment of the foregoing method, the method further comprisesrepeating, for each remaining child folder ID, if any, of the internalqueue for a folder of the immediate children: said dequeuing, saidfetching, said creating, and said enqueuing of said second pass.

In an embodiment of the foregoing method, the method further comprisesrepeating said dequeuing, said fetching, said creating, and saidenqueuing of said second pass, for each generation of children of thefirst folder or until a specified limit is met.

In an embodiment, a computer-readable medium that has program coderecorded thereon that when executed by at least one processor causes theat least one processor to perform a method for enqueueing metadata ofchildren of a first folder associated with a folder event. The methodcomprising, in a first pass: dequeuing, from a first persistent queue, afirst work item associated with the folder event, fetching firstmetadata of the first folder and of immediate children of the firstfolder, where the immediate children comprise at least one of a folderor a file, creating a second work item comprising the fetched firstmetadata, enqueuing the second work item to a second persistent queue,updating the first work item for each folder of the immediate childrenby adding a child folder identifier (ID) into an internal queue of thefirst work item, and inserting the updated first work item into thefirst persistent queue. In a second pass, the method further comprises:dequeuing, from the first persistent queue, a child folder ID of theinternal queue for a second folder of the immediate children, fetchingsecond metadata of immediate children of the second folder of theimmediate children of the first folder based on the child folder ID,creating a third work item comprising the fetched second metadata, andenqueuing the third work item to the second persistent queue.

In an embodiment of the foregoing computer-readable medium, the methodfurther comprises repeating, for each remaining child folder ID, if any,of the internal queue for a folder of the immediate children: saiddequeuing, said fetching, said creating, and said enqueuing of saidsecond pass.

In an embodiment of the foregoing computer-readable medium, the methodfurther comprises repeating said dequeuing, said fetching, saidcreating, and said enqueuing of said second pass, for each generation ofchildren of the first folder or until a specified limit is met.

V. Conclusion

While various embodiments of the present application have been describedabove, it should be understood that they have been presented by way ofexample only, and not limitation. It will be understood by those skilledin the relevant art(s) that various changes in form and details may bemade therein without departing from the spirit and scope of theapplication as defined in the appended claims. Accordingly, the breadthand scope of the present application should not be limited by any of theabove-described exemplary embodiments, but should be defined only inaccordance with the following claims and their equivalents.

What is claimed is:
 1. A system in a server, comprising: a processor;and a memory device that stores program code for execution by theprocessor, the program code comprising: a queue manager that dequeues,from a first persistent queue, a first work item associated with afolder event indicative of an action performed on a first folder; and anAPI manager that fetches first metadata of the first folder and ofimmediate children of the first folder; wherein the queue managerenqueues, for the first folder and each immediate child of the firstfolder, the fetched first metadata in a second persistent queue in asecond pass; the API manager fetches second metadata for additionalchildren of the first folder; and the queue manager enqueues the fetchedsecond metadata in the second persistent queue.
 2. The system of claim1, wherein to enqueue, for the first folder and each immediate child ofthe first folder, the fetched first metadata in a second persistentqueue, the queue manager: creates a second work item corresponding toeach of the first folder and the immediate children of the first folderthat comprises the respective first metadata; and enqueues each secondwork item in the second persistent queue.
 3. The system of claim 2,wherein the system provides each second work item for detection ofpossible policy violation.
 4. The system of claim 1, wherein the APImanager fetches metadata for a number of children of the first folder upto a set number of children.
 5. The system of claim 4, wherein the APImanager dynamically adjusts the set number of children of the firstfolder for which to fetch metadata.
 6. The system of claim 1, whereinthe API manager fetches metadata for a set depth of generations ofchildren of the first folder.
 7. The system of claim 1, wherein thequeue manager: inserts, in an internal queue of the first work item, achild folder identifier for each folder of the children of the firstfolder; and enqueues the updated first work item into the firstpersistent queue.
 8. A method in a server, comprising: receiving from acloud-based file system a folder event indicative of an action performedon a first folder, the folder event including at least a folderidentifier (ID) of the first folder; dequeuing a first work itemassociated with the folder event from a first persistent queue; fetchingfirst metadata of the first folder and of immediate children of thefirst folder; enqueuing, for the first folder and each immediate childof the first folder, the fetched first metadata in a second persistentqueue; fetching second metadata for additional children of the firstfolder; and enqueuing the fetched second metadata in the secondpersistent queue.
 9. The method of claim 8, wherein said enqueuing, forthe first folder and each immediate child of the first folder, thefetched first metadata in a second persistent queue comprises: creatinga second work item corresponding to each of the first folder and theimmediate children of the first folder that comprises the respectivefirst metadata; and enqueuing each second work item in the secondpersistent queue.
 10. The method of claim 9, further comprising:providing each second work item for detection of possible policyviolation.
 11. The method of claim 8, further comprising: setting anumber of children of the first folder for which to fetch metadata; andwherein said fetching first metadata and said second metadata comprises:fetching metadata for a number of children of the first folder up to theset number.
 12. The method of claim 11, further comprising: dynamicallyadjusting the set number of children of the first folder for which tofetch metadata.
 13. The method of claim 8, further comprising: setting adepth of generations of children of the first folder for which to fetchmetadata; and wherein said fetching first metadata and said secondmetadata comprises: fetching metadata for the set depth of generationsof children of the first folder.
 14. The method of claim 8, furthercomprising: inserting, in an internal queue of the first work item, achild folder identifier for each folder of the children of the firstfolder; and enqueuing the updated first work item into the firstpersistent queue.
 15. A computer-readable storage medium having programcode recorded thereon that when executed by a processor causes theprocessor to perform a method comprising: receiving from a cloud-basedfile system a folder event indicative of an action performed on a firstfolder, the folder event including at least a folder identifier (ID) ofthe first folder; dequeuing a first work item associated with the folderevent from a first persistent queue; fetching first metadata of thefirst folder and of immediate children of the first folder; enqueuing,for the first folder and each immediate child of the first folder, thefetched first metadata in a second persistent queue; fetching secondmetadata for additional children of the first folder; and enqueuing thefetched second metadata in the second persistent queue.
 16. Thecomputer-readable storage medium of claim 15, wherein said enqueuing,for the first folder and each immediate child of the first folder, thefetched first metadata in a second persistent queue comprises: creatinga second work item corresponding to each of the first folder and theimmediate children of the first folder that comprises the respectivefirst metadata; and enqueuing each second work item in the secondpersistent queue.
 17. The computer-readable storage medium of claim 16,further comprising: providing each second work item for detection ofpossible policy violation.
 18. The computer-readable storage medium ofclaim 15, further comprising: setting a number of children of the firstfolder for which to fetch metadata; and wherein said fetching firstmetadata and said second metadata comprises: fetching metadata for anumber of children of the first folder up to the set number.
 19. Thecomputer-readable storage medium of claim 18, further comprising:dynamically adjusting the set number of children of the first folder forwhich to fetch metadata.
 20. The computer-readable storage medium ofclaim 15, further comprising: setting a depth of generations of childrenof the first folder for which to fetch metadata; and wherein saidfetching first metadata and said second metadata comprises: fetchingmetadata for the set depth of generations of children of the firstfolder.